0.1 Overview

In this weeks assignment we will show how to create maps which contain geo spacial information. First, we will build a scatter map which shows information about various gas stations in the United States. After that, we will narrow down our view to the city of Philadelphia; analyzing crime data from the years of 2014-2015. Let’s see what kind of story we can extract by visualizing our data sets.

0.2 Gas Station Source Data

It’s important to describe any data set before working with it, so let’s begin by analyzing the raw gas station data. In the previous weeks assignements, we have loaded our data via having the csv in a local working directory. For this week, data was uploaded to my github. Here is how we can load the data via https.

gas_data <- read.csv("https://jmartin12.github.io/STAT553/data/gas_station_data.csv", header = TRUE)

Simple as that! First, let’s view the raw data in it’s table format, along with seeing how many rows we have.

  X site_row_id STATE      county         ADDRESS     CITY   ycoord    xcoord
1 1  1-3R8J-494    CA Los Angeles 37120 47TH ST E PALMDALE 34.55584 -118.0452
2 2  1-3R8J-362    WA    Franklin  1212 N 4TH AVE    PASCO 46.23890 -119.0950
                     SITE_DESCRIPTION service_or_fuel diesel
1 Los Angeles-Long Beach-Santa Ana CA            Fuel      Y
2         Kennewick-Pasco-Richland WA            Fuel      N
  twentyfour_hour_flag car_wash truckstop_flag description PUMP_TECH POC HIFCA
1                    Y        N              Y       URBAN         O   0     0
2                    N        N              N       URBAN         O   0     1
  ZIPnew POCAGE POCGAP ZIPPOC HFG  MSA dist.to.poc cate.poc.density
1  93552     NA     NA      0   1 4480   8.2275601       (-1e-06,1]
2  99301     NA     NA      1   0 6740   0.2788194       (-1e-06,1]
  cate.poc.age cate.poc.age.20 cate.poc.intensity cate.poc.intensity.tot
1     (15,140]        (15,140]            (5,Inf]                (8,Inf]
2     (15,140]        (15,140]            (5,Inf]                (8,Inf]
  MSA_POC MSA_POC.1
1       1         1
2       0         0
[1] 72798

While there are many columns in this data set, what’s important to notice for this weeks lesson is that we have two columns titled as ycoord and xcoord. These columns represent the coordinates in latitude and longitude, respectively. We will use both of these columns later when creating our map.

It’s also important to note the size of the data set; a mere 70K rows! If we plotted all of these on our map, it would be cluttered and difficult to interpret. Let’s reduce our data set size to simply 500 random rows. The code below demonstrates how we do this:

# Set the seed so you get the same rows I got.
set.seed(123)

# Get the 500 random rows
reduced_gas_data <- gas_data[sample(nrow(gas_data), 500), ]

# Print the row count
print(nrow(reduced_gas_data))
[1] 500

Success! We have reduced our data to 500 random rows.

0.3 Creating a Basic Geo Spacial Informative Scatter Map

For this week, we will focus on creating maps using the leaflet library. Please view the generated map, while reading an explanation of the code below.

# Create a leaflet map
gas_map <- leaflet() %>%
  addTiles() %>%
  setView(lng = -98.5795, lat = 39.8283, zoom = 3)

# Add markers for each gas station
for (i in 1:nrow(reduced_gas_data)) {
  gas_meta = paste('state: ',reduced_gas_data$STATE[i], '\n county: ', reduced_gas_data$county[i], '\n city: ' ,reduced_gas_data$CITY[i] ,'\n address: ', reduced_gas_data$ADDRESS[i])
  
  gas_map <- addMarkers(
    map = gas_map,
    lng = reduced_gas_data$xcoord[i],
    lat = reduced_gas_data$ycoord[i],
    label = htmltools::HTML(htmlEscape(gas_meta))
    )
}

gas_map

Walking through the code, we first start out with gas_map <- leaflet() %>% addTiles() %>% setView(lng = -98.5795, lat = 39.8283, zoom = 3).
The one thing that should be noted is that the default view was taken as the center point of the USA. Since our reduced data was randomly taken from the original 70K rows, using the mean of the lng and lat values will not give us a consistently good view, because the view is then subject to what gas stations we have randomly chosen. To circumvent this, the center point of the USA was used.

Aside to that, we then use a basic for loop control structure, in combination with the paste(...) function to add location metadata using the city, state, county, and address from our data set.

This is a primitive example of how the leaflet library can be used to graph geographical information. Let’s move onto a more advanced example so that we are able to extract more useful information and decipher a story therein.

0.4 Philly Crime Data

First, we will need a new data set that is hosted on my github.

philly_data <- read.csv("https://jmartin12.github.io/STAT553/data/PhillyCrimeSince2015.csv", header = TRUE)

An example few rows are given:

       dc_key                      race    sex    fatal           date
1 2.02422E+11      Black (Non-Hispanic) Female Nonfatal 3/3/2024 14:49
2 2.02426E+11 Hispanic (Black or White)   Male Nonfatal 3/1/2024 22:18
  has_court_case age   street_name block_number zip_code council_district
1             No  20 N COLORADO ST         2500    19132                5
2             No  58 N FRANKLIN ST         2600    19133                5
  police_district                       neighborhood house_district
1              22                  Sharswood-Stanton            181
2              26 Northern Liberties-West Kensington            197
  senate_district         school_catchment       lng      lat
1               3 Tanner G. Duckrey School -75.16060 39.99166
2               3 John F. Hartranft School -75.14468 39.99152

The total amount of rows are:

[1] 15520

Philly is definitely known for crime in certain areas. Let’s focus in on the year 2023. To accomplish this, we will have to perform some data transformations against the original date column. First, the following code parses out the year portion, and stores it in a vector.

years <- character(nrow(philly_data))

for (i in 1:nrow(philly_data)) {
  date_string <- philly_data$date[i]
  date_object <- strptime(date_string, "%m/%d/%Y %H:%M")
  years[i] <- format(date_object, "%Y")
}

head(years)
[1] "2024" "2024" "2024" "2024" "2024" "2024"

Great! Now we have all the years. Let’s add it as a new column to our original philly crime dataset.

philly_data$year <- years
head(philly_data, 2)
       dc_key                      race    sex    fatal           date
1 2.02422E+11      Black (Non-Hispanic) Female Nonfatal 3/3/2024 14:49
2 2.02426E+11 Hispanic (Black or White)   Male Nonfatal 3/1/2024 22:18
  has_court_case age   street_name block_number zip_code council_district
1             No  20 N COLORADO ST         2500    19132                5
2             No  58 N FRANKLIN ST         2600    19133                5
  police_district                       neighborhood house_district
1              22                  Sharswood-Stanton            181
2              26 Northern Liberties-West Kensington            197
  senate_district         school_catchment       lng      lat year
1               3 Tanner G. Duckrey School -75.16060 39.99166 2024
2               3 John F. Hartranft School -75.14468 39.99152 2024

Easy as that! We can see the year column is now added to the data set. To narrow down a subset of 2023 data only, we can reduce our data set by filtering on the newly added year column.

philly_data_2023 <- subset(philly_data, year == 2023)
head(philly_data_2023$date, 5)
[1] "12/31/2023 4:57"  "12/31/2023 3:35"  "12/31/2023 3:13"  "12/30/2023 22:56"
[5] "12/30/2023 11:32"

Only the date column is shown in the above example – the remaining columns are still stored in memory. We can clearly see there are only dates in 2023.

0.5 Philly Crimes - Mapped

Now that we have our 2023 data, let’s go ahead and use leafly to map our data. The code below adds two additional columns to the dataset so we can easily determine what colors to use, and in addition generating the label information.

From there, a title along with a legend is added to the graph.

colors <- character(nrow(philly_data_2023))
labels <- character(nrow(philly_data_2023))
 
for (i in 1:nrow(philly_data_2023)) {
  # Handle the colors
  if (philly_data_2023$fatal[i] == 'Fatal') {
    colors[i] <- '#000000'
  }
  else {
    colors[i] <- '#CC79A7'    
  }
  
  # Handle the info to display in the label
  labels[i] <- paste('Gender: ', philly_data_2023$sex[i], 'Neighborhood: ', philly_data_2023$neighborhood[i], 'Block Number: ', philly_data_2023$block_number[i])
}

philly_data_2023$crime_type_color <- colors
philly_data_2023$label <- labels

## Map title
title <- tags$div( HTML('<font color = "darkred" size =4><b>Philly Fatal and Non-Fatal Crimes 2023</b></font>')
)

# Create a Leaflet map
m <- leaflet(philly_data_2023) %>%
  addTiles() %>%
  setView(lat = 40, lng = -75.1652, zoom = 11) %>%
  addCircleMarkers(
    lng = ~lng,
    lat = ~lat,
    color = ~crime_type_color,
    radius = 3,
    label = ~label,
    labelOptions = labelOptions(noHide = FALSE, textOnly = FALSE)) %>%
  addControl(title, position = "topright") %>%
  addLegend(position = "bottomleft", 
            colors = c("#CC79A7", "#000000"),
            labels= c("Non Fatal", "Fatal"),
            title= "Crime Type",
            opacity = 0.8)

m



0.6 Narration

It is particularly interesting to see the distribution of crimes across the city of Philly. There are a few hotspots of fatal crimes. The primary hotspot being the southwest region of the city, followed by a smaller hotspot just north of the city.

The dataset itself can tell us more with different graphs. Let’s view the % of crimes committed by gender.

# Count the number of crimes per gender
crimes_per_gender <- philly_data_2023 %>%
  group_by(sex) %>%
  summarise(crime_count = n())

# Calculate the percentage of each gender
crimes_per_gender$percentage <- (crimes_per_gender$crime_count / sum(crimes_per_gender$crime_count)) * 100

# Create a pie chart
p3 <- plot_ly(crimes_per_gender, labels = ~sex, values = ~percentage, type = 'pie') %>%
  layout(title = "% of Crimes Committed by Males vs. Females")

# Print the plot
p3

It seems that males commit most of the crimes in the city of Philadelphia. Something else of interest could be to analyze which school districts have the highest crime count. In particular, parents could find this data to be of use when determining what school district to send their kids to.

  # Count the number of crimes per school district
  crimes_per_district <- philly_data_2023 %>%
    group_by(school_catchment) %>%
    summarise(crime_count = n()) %>%
    top_n(10, crime_count)

  
  # Create a bar graph
  p <- plot_ly(crimes_per_district, y = ~school_catchment, x = ~crime_count, type = 'bar') %>%
    layout(title = "Number of Crimes per School District in Philadelphia (2023)",
           xaxis = list(title = "School District"),
           yaxis = list((title = "Number of Crimes")))
  
  # Print the plot
  p

These ten schools are the schools with the highest number of reported crimes, indicating a potentially higher crime rate in these areas. This data could be influential when informing parents about the safety risks associated with a particular school district, helping them make informed decisions about their children’s education and well-being.

---
title: "Week 7 - Geo Spacial Data Visualization"
author: "Jacob Martin"
date: "West Chester University "
output:
  html_document: 
    toc: yes
    toc_depth: 4
    toc_float: yes
    fig_width: 6
    number_sections: yes
    toc_collapsed: yes
    code_folding: hide
    code_download: yes
    smooth_scroll: true
    theme: readable
    fig_height: 4
---

```{=html}
<style type="text/css">

div#TOC li {
    list-style:none;
    background-color:lightgray;
    background-image:none;
    background-repeat:none;
    background-position:0;
    font-family: Arial, Helvetica, sans-serif;
    color: #780c0c;
}

/* mouse over link */
div#TOC a:hover {
  color: red;
}

/* unvisited link */
div#TOC a:link {
  color: blue;
}



h1.title {
  font-size: 24px;
  color: Darkblue;
  text-align: center;
  font-family: Arial, Helvetica, sans-serif;
  font-variant-caps: normal;
}
h4.author { 
    font-size: 18px;
  font-family: "Times New Roman", Times, serif;
  color: DarkRed;
  text-align: center;
}
h4.date { 
  font-size: 18px;
  font-family: "Times New Roman", Times, serif;
  color: DarkBlue;
  text-align: center;
}
h1 {
    font-size: 24px;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: center;
}
h2 {
    font-size: 18px;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h3 { 
    font-size: 15px;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h4 { /* Header 4 - and the author and data headers use this too  */
    font-size: 18px;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: left;
}

/* unvisited link */
a:link {
  color: green;
}

/* visited link */
a:visited {
  color: green;
}

/* mouse over link */
a:hover {
  color: red;
}

/* selected link */
a:active {
  color: yellow;
}

</style>
```
```{r setup, include=FALSE}
# code chunk specifies whether the R code, warnings, and output 
# will be included in the output files.
options(repos = list(CRAN="http://cran.rstudio.com/"))
if (!require("tidyverse")) {
   install.packages("tidyverse")
   library(tidyverse)
}
if (!require("knitr")) {
   install.packages("knitr")
   library(knitr)
}
if (!require("cowplot")) {
   install.packages("cowplot")
   library(cowplot)
}
if (!require("latex2exp")) {
   install.packages("latex2exp")
   library(latex2exp)
}
if (!require("plotly")) {
   install.packages("plotly")
   library(plotly)
}
if (!require("gapminder")) {
   install.packages("gapminder")
   library(gapminder)
}
if (!require("png")) {
    install.packages("png")             # Install png package
    library("png")
}
if (!require("RCurl")) {
    install.packages("RCurl")           # Install RCurl package
    library("RCurl")
}
if (!require("colourpicker")) {
    install.packages("colourpicker")              
    library("colourpicker")
}
if (!require("gifski")) {
    install.packages("gifski")              
    library("gifski")
}
if (!require("magick")) {
    install.packages("magick")              
    library("magick")
}
if (!require("grDevices")) {
    install.packages("grDevices")              
    library("grDevices")
}
### ggplot and extensions
if (!require("ggplot2")) {
    install.packages("ggplot2")              
    library("ggplot2")
}
if (!require("gganimate")) {
    install.packages("gganimate")              
    library("gganimate")
}
if (!require("ggridges")) {
    install.packages("ggridges")              
    library("ggridges")
}
if (!require("graphics")) {
    install.packages("graphics")              
    library("graphics")
}
if (!require("tidyr")) {
   install.packages("tidyr", dependencies = TRUE)
   library(tidyr)
}
if (!require("reshape2")) {
   install.packages("reshape2", dependencies = TRUE)
   library(reshape2)
}
if (!require("leaflet")) {
   install.packages("leaflet", dependencies = TRUE)
   library(leaflet)
}
if (!require("htmltools")) {
   install.packages("htmltools", dependencies = TRUE)
   library(htmltools)
}

knitr::opts_chunk$set(echo = TRUE,       
                      warning = FALSE,   
                      result = TRUE,   
                      message = FALSE,
                      comment = NA)
```


## Overview
In this weeks assignment we will show how to create maps which contain geo spacial information. First, we will build a scatter map which shows information about various gas stations in the United States. After that, we will narrow down our view to the city of Philadelphia; analyzing crime data from the years of 2014-2015. Let's see what kind of story we can extract by visualizing our data sets.

## Gas Station Source Data
It's important to describe any data set before working with it, so let's begin by analyzing the raw gas station data. 
In the previous weeks assignements, we have loaded our data via having the csv in a local working directory. For this week, data was uploaded to my github. Here is how we can load the data via https. 

```{r echo=TRUE}
gas_data <- read.csv("https://jmartin12.github.io/STAT553/data/gas_station_data.csv", header = TRUE)
```

Simple as that! First, let's view the raw data in it's table format, along with seeing how many rows we have. 

```{r echo=FALSE}
head(gas_data, 2)
nrow(gas_data)
```

While there are many columns in this data set, what's important to notice for this weeks lesson is that we have two columns titled as `ycoord` and `xcoord`. These columns represent the coordinates in `latitude` and `longitude`, respectively. We will use both of these columns later when creating our map.

It's also important to note the size of the data set; a mere 70K rows! If we plotted all of these on our map, it would be cluttered and difficult to interpret. Let's reduce our data set size to simply 500 random rows. The code below demonstrates how we do this:

```{r echo = TRUE}
# Set the seed so you get the same rows I got.
set.seed(123)

# Get the 500 random rows
reduced_gas_data <- gas_data[sample(nrow(gas_data), 500), ]

# Print the row count
print(nrow(reduced_gas_data))
```

Success! We have reduced our data to 500 random rows.

## Creating a Basic Geo Spacial Informative Scatter Map
For this week, we will focus on creating maps using the `leaflet` library. Please view the generated map, while reading an explanation of the code below. 

```{r echo=TRUE}
# Create a leaflet map
gas_map <- leaflet() %>%
  addTiles() %>%
  setView(lng = -98.5795, lat = 39.8283, zoom = 3)

# Add markers for each gas station
for (i in 1:nrow(reduced_gas_data)) {
  gas_meta = paste('state: ',reduced_gas_data$STATE[i], '\n county: ', reduced_gas_data$county[i], '\n city: ' ,reduced_gas_data$CITY[i] ,'\n address: ', reduced_gas_data$ADDRESS[i])
  
  gas_map <- addMarkers(
    map = gas_map,
    lng = reduced_gas_data$xcoord[i],
    lat = reduced_gas_data$ycoord[i],
    label = htmltools::HTML(htmlEscape(gas_meta))
    )
}

gas_map
```

Walking through the code, we first start out with `gas_map <- leaflet() %>% addTiles() %>% setView(lng = -98.5795, lat = 39.8283, zoom = 3)`.
\
The one thing that should be noted is that the default view was taken as the center point of the USA. Since our reduced data was randomly taken from the original 70K rows, using the mean of the lng and lat values will not give us a consistently good view, because the view is then subject to what gas stations we have randomly chosen. To circumvent this, the center point of the USA was used. 

Aside to that, we then use a basic `for` loop control structure, in combination with the `paste(...)` function to add location metadata using the `city`, `state`, `county`, and `address` from our data set.

This is a primitive example of how the `leaflet` library can be used to graph geographical information. Let's move onto a more advanced example so that we are able to extract more useful information and decipher a story therein.

## Philly Crime Data

First, we will need a new data set that is hosted on my github. 

```{r echo=TRUE}
philly_data <- read.csv("https://jmartin12.github.io/STAT553/data/PhillyCrimeSince2015.csv", header = TRUE)
```

An example few rows are given: 

```{r echo=FALSE}
head(philly_data, 2)
```

The total amount of rows are: 

```{r echo=FALSE}
nrow(philly_data)
```

Philly is definitely known for crime in certain areas. Let's focus in on the year 2023. To accomplish this, we will have to perform some data transformations against the original `date` column. First, the following code parses out the `year` portion, and stores it in a vector.


```{r echo=TRUE}
years <- character(nrow(philly_data))

for (i in 1:nrow(philly_data)) {
  date_string <- philly_data$date[i]
  date_object <- strptime(date_string, "%m/%d/%Y %H:%M")
  years[i] <- format(date_object, "%Y")
}

head(years)
```

Great! Now we have all the years. Let's add it as a new column to our original philly crime dataset.

```{r echo=TRUE}
philly_data$year <- years
head(philly_data, 2)
```

Easy as that! We can see the year column is now added to the data set. 
To narrow down a subset of 2023 data only, we can reduce our data set by filtering on the newly added `year` column.

```{r echo=TRUE}
philly_data_2023 <- subset(philly_data, year == 2023)
head(philly_data_2023$date, 5)
```

Only the `date` column is shown in the above example -- the remaining columns are still stored in memory. We can clearly see there are only dates in `2023`. 
\

## Philly Crimes - Mapped
Now that we have our 2023 data, let's go ahead and use leafly to map our data. The code below adds two additional columns to the dataset so we can easily determine what colors to use, and in addition generating the label information. 

From there, a title along with a legend is added to the graph. 

```{r echo=TRUE}
colors <- character(nrow(philly_data_2023))
labels <- character(nrow(philly_data_2023))
 
for (i in 1:nrow(philly_data_2023)) {
  # Handle the colors
  if (philly_data_2023$fatal[i] == 'Fatal') {
    colors[i] <- '#000000'
  }
  else {
    colors[i] <- '#CC79A7'    
  }
  
  # Handle the info to display in the label
  labels[i] <- paste('Gender: ', philly_data_2023$sex[i], 'Neighborhood: ', philly_data_2023$neighborhood[i], 'Block Number: ', philly_data_2023$block_number[i])
}

philly_data_2023$crime_type_color <- colors
philly_data_2023$label <- labels

## Map title
title <- tags$div( HTML('<font color = "darkred" size =4><b>Philly Fatal and Non-Fatal Crimes 2023</b></font>')
)

# Create a Leaflet map
m <- leaflet(philly_data_2023) %>%
  addTiles() %>%
  setView(lat = 40, lng = -75.1652, zoom = 11) %>%
  addCircleMarkers(
    lng = ~lng,
    lat = ~lat,
    color = ~crime_type_color,
    radius = 3,
    label = ~label,
    labelOptions = labelOptions(noHide = FALSE, textOnly = FALSE)) %>%
  addControl(title, position = "topright") %>%
  addLegend(position = "bottomleft", 
            colors = c("#CC79A7", "#000000"),
            labels= c("Non Fatal", "Fatal"),
            title= "Crime Type",
            opacity = 0.8)

m
```

\
\

## Narration

It is particularly interesting to see the distribution of crimes across the city of Philly. There are a few hotspots of fatal crimes. The primary hotspot being the southwest region of the city, followed by a smaller hotspot just north of the city.

The dataset itself can tell us more with different graphs. Let's view the % of crimes committed by gender.

``` {r echo=TRUE}
# Count the number of crimes per gender
crimes_per_gender <- philly_data_2023 %>%
  group_by(sex) %>%
  summarise(crime_count = n())

# Calculate the percentage of each gender
crimes_per_gender$percentage <- (crimes_per_gender$crime_count / sum(crimes_per_gender$crime_count)) * 100

# Create a pie chart
p3 <- plot_ly(crimes_per_gender, labels = ~sex, values = ~percentage, type = 'pie') %>%
  layout(title = "% of Crimes Committed by Males vs. Females")

# Print the plot
p3
```



It seems that males commit most of the crimes in the city of Philadelphia. Something else of interest could be to analyze which school districts have the highest crime count. In particular, parents could find this data to be of use when determining what school district to send their kids to.   

```{r echo=TRUE}
  # Count the number of crimes per school district
  crimes_per_district <- philly_data_2023 %>%
    group_by(school_catchment) %>%
    summarise(crime_count = n()) %>%
    top_n(10, crime_count)

  
  # Create a bar graph
  p <- plot_ly(crimes_per_district, y = ~school_catchment, x = ~crime_count, type = 'bar') %>%
    layout(title = "Number of Crimes per School District in Philadelphia (2023)",
           xaxis = list(title = "School District"),
           yaxis = list((title = "Number of Crimes")))
  
  # Print the plot
  p
  
```

These ten schools are the schools with the highest number of reported crimes, indicating a potentially higher crime rate in these areas. This data could be influential when informing parents about the safety risks associated with a particular school district, helping them make informed decisions about their children's education and well-being.





















